1. Presentation of the case


2. Data exploration

As mentioned above, the main data has been sourced from the opendata.swiss project, having been collected and published by the Open Data Portal of the City Council of Zürich, under the name “Hundebestände der Stadt Zürich, seit 2015”. The description of the data set from the original source is as follows:

This dataset contains information on dogs and their owners from the municipal dog register since 2015. Information on the age group, gender and statistical district of residence is provided for dog owners. The breed, breed type, sex, year of birth, age and color are recorded for each dog. The dog register is kept by the Dog Control Department of the Zurich City Police.

For the sake of a seamless workflow and easier interpretation of the variables within our group, the names of columns as well as certain string values have been translated to English from the original German version.

The main source of data is the kul100od1001.csv file, which contains a collection of 70,967 listings with 33 variables.

dim(df.dogs)
## [1] 70967    33
str(df.dogs)
## 'data.frame':    70967 obs. of  33 variables:
##  $ ReferenceYear  : int  2015 2015 2015 2015 2015 2015 2015 2015 2015 2015 ...
##  $ DataStatusCoded: chr  "D" "D" "D" "D" ...
##  $ OwnerId        : int  126 574 695 893 1177 4004 4050 4155 4203 4215 ...
##  $ AgeV10Coded    : int  60 60 40 60 50 60 40 60 50 40 ...
##  $ AgeV10Text     : chr  "60- bis 69-Jährige" "60- bis 69-Jährige" "40- bis 49-Jährige" "60- bis 69-Jährige" ...
##  $ AgeV10Sort     : int  7 7 5 7 6 7 5 7 6 5 ...
##  $ OwnerSexCoded  : int  1 2 1 2 1 2 2 2 2 2 ...
##  $ OwnerSexText   : chr  "männlich" "weiblich" "männlich" "weiblich" ...
##  $ SexSort        : int  1 2 1 2 1 2 2 2 2 2 ...
##  $ DistrictCoded  : int  9 2 6 7 10 3 11 9 2 8 ...
##  $ DistrictText   : chr  "Kreis 9" "Kreis 2" "Kreis 6" "Kreis 7" ...
##  $ DistrictSort   : int  9 2 6 7 10 3 11 9 2 8 ...
##  $ QuarterCoded   : int  92 23 63 71 102 34 111 92 21 81 ...
##  $ QuarterText    : chr  "Altstetten" "Leimbach" "Oberstrass" "Fluntern" ...
##  $ QuarterSort    : int  92 23 63 71 102 34 111 92 21 81 ...
##  $ Breed1Text     : chr  "Welsh Terrier" "Cairn Terrier" "Labrador Retriever" "Mittelschnauzer" ...
##  $ Breed2Text     : chr  "Keine" "Keine" "Keine" "Keine" ...
##  $ MixedBreedCoded: int  1 1 1 1 1 1 1 1 2 3 ...
##  $ MixedBreedText : chr  "Rassehund" "Rassehund" "Rassehund" "Rassehund" ...
##  $ MixedBreedSort : int  1 1 1 1 1 1 1 1 2 3 ...
##  $ BreedTypeCode  : chr  "K" "K" "I" "I" ...
##  $ BreedTypeLong  : chr  "Kleinwüchsig" "Kleinwüchsig" "Rassentypenliste I" "Rassentypenliste I" ...
##  $ BreedTypeSort  : int  1 1 2 2 1 1 1 1 2 2 ...
##  $ DogBirthYear   : int  2011 2002 2012 2010 2011 2010 2012 2002 2005 2001 ...
##  $ DogAgeCoded    : int  3 12 2 4 3 4 2 12 9 13 ...
##  $ DogAgeText     : chr  "3-Jährige" "12-Jährige" "2-Jährige" "4-Jährige" ...
##  $ DogAgeSort     : int  3 12 2 4 3 4 2 12 9 13 ...
##  $ DogSexCoded    : int  2 2 2 2 1 1 1 1 2 2 ...
##  $ DogSexText     : chr  "weiblich" "weiblich" "weiblich" "weiblich" ...
##  $ DogSexSort     : int  2 2 2 2 1 1 1 1 2 2 ...
##  $ DogColorText   : chr  "schwarz/braun" "brindle" "braun" "schwarz" ...
##  $ NumberOfDogs   : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ AlterVHundSort : int  3 12 2 4 3 4 2 12 9 13 ...

As can be seen in the structure of the data, the set comprises several observations of diverse data types. Most variables are expressed three times as different types, as integers (Coded and Sort form), as well as strings (Text). Depending on their implementation in the study they have been selected in one of the three variants, therefore our selection of relevant observations can be summarized as follows:

Numerical values:

Binary variables: !!! Is breed multinomial or factor? !!!

String values:

The original data set has been complemented with the GEOJSON file stzh.adm_stadtkreise_a.geojson for the production of map plots, by merging both data sets with the district name variables, as convened by the City Council of Zürich.


3. Models

3.1. Model 01: Linear Model

##Dog count by district over time

ggplotly(fit01_ggplot)
## `geom_smooth()` using formula = 'y ~ x'
lm.counts.year <- lm(DogCount ~ ReferenceYear * DistrictText,
                     data = dog_count_per_neighborhood_year)
summary(lm.counts.year)
## 
## Call:
## lm(formula = DogCount ~ ReferenceYear * DistrictText, data = dog_count_per_neighborhood_year)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -54.572 -16.100  -0.033  16.181  58.222 
## 
## Coefficients:
##                                                      Estimate Std. Error
## (Intercept)                                        -5.527e+03  7.289e+03
## ReferenceYear                                       2.800e+00  3.610e+00
## DistrictTextKreis 10                               -3.312e+04  1.031e+04
## DistrictTextKreis 11                               -1.047e+05  1.031e+04
## DistrictTextKreis 12                               -3.837e+04  1.031e+04
## DistrictTextKreis 2                                -9.000e+04  1.031e+04
## DistrictTextKreis 3                                -5.168e+04  1.031e+04
## DistrictTextKreis 4                                -2.623e+04  1.031e+04
## DistrictTextKreis 5                                -3.056e+04  1.031e+04
## DistrictTextKreis 6                                -3.980e+04  1.031e+04
## DistrictTextKreis 7                                -7.468e+04  1.031e+04
## DistrictTextKreis 8                                -2.999e+04  1.031e+04
## DistrictTextKreis 9                                -8.681e+04  1.031e+04
## DistrictTextUnbekannt (Stadt Zürich)                5.263e+03  1.170e+04
## ReferenceYear:DistrictTextKreis 10                  1.670e+01  5.106e+00
## ReferenceYear:DistrictTextKreis 11                  5.245e+01  5.106e+00
## ReferenceYear:DistrictTextKreis 12                  1.922e+01  5.106e+00
## ReferenceYear:DistrictTextKreis 2                   4.488e+01  5.106e+00
## ReferenceYear:DistrictTextKreis 3                   2.588e+01  5.106e+00
## ReferenceYear:DistrictTextKreis 4                   1.313e+01  5.106e+00
## ReferenceYear:DistrictTextKreis 5                   1.520e+01  5.106e+00
## ReferenceYear:DistrictTextKreis 6                   1.992e+01  5.106e+00
## ReferenceYear:DistrictTextKreis 7                   3.747e+01  5.106e+00
## ReferenceYear:DistrictTextKreis 8                   1.500e+01  5.106e+00
## ReferenceYear:DistrictTextKreis 9                   4.342e+01  5.106e+00
## ReferenceYear:DistrictTextUnbekannt (Stadt Zürich) -2.668e+00  5.798e+00
##                                                    t value Pr(>|t|)    
## (Intercept)                                         -0.758 0.450412    
## ReferenceYear                                        0.776 0.440152    
## DistrictTextKreis 10                                -3.213 0.001858 ** 
## DistrictTextKreis 11                               -10.157 2.49e-16 ***
## DistrictTextKreis 12                                -3.722 0.000354 ***
## DistrictTextKreis 2                                 -8.731 1.89e-13 ***
## DistrictTextKreis 3                                 -5.013 2.89e-06 ***
## DistrictTextKreis 4                                 -2.545 0.012742 *  
## DistrictTextKreis 5                                 -2.965 0.003933 ** 
## DistrictTextKreis 6                                 -3.861 0.000220 ***
## DistrictTextKreis 7                                 -7.244 1.83e-10 ***
## DistrictTextKreis 8                                 -2.910 0.004618 ** 
## DistrictTextKreis 9                                 -8.421 8.01e-13 ***
## DistrictTextUnbekannt (Stadt Zürich)                 0.450 0.654061    
## ReferenceYear:DistrictTextKreis 10                   3.271 0.001550 ** 
## ReferenceYear:DistrictTextKreis 11                  10.273  < 2e-16 ***
## ReferenceYear:DistrictTextKreis 12                   3.764 0.000307 ***
## ReferenceYear:DistrictTextKreis 2                    8.791 1.43e-13 ***
## ReferenceYear:DistrictTextKreis 3                    5.070 2.30e-06 ***
## ReferenceYear:DistrictTextKreis 4                    2.572 0.011839 *  
## ReferenceYear:DistrictTextKreis 5                    2.977 0.003790 ** 
## ReferenceYear:DistrictTextKreis 6                    3.901 0.000191 ***
## ReferenceYear:DistrictTextKreis 7                    7.338 1.19e-10 ***
## ReferenceYear:DistrictTextKreis 8                    2.938 0.004252 ** 
## ReferenceYear:DistrictTextKreis 9                    8.504 5.46e-13 ***
## ReferenceYear:DistrictTextUnbekannt (Stadt Zürich)  -0.460 0.646508    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 27.96 on 85 degrees of freedom
## Multiple R-squared:  0.9953, Adjusted R-squared:  0.9939 
## F-statistic: 718.6 on 25 and 85 DF,  p-value: < 2.2e-16

3.2. Linear model


4. Additional chapter

5. Conclusion

6. Appendix: Generative AI tools